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Abstract 

In this work the dynamic compressive sensing (CS) problem of recovering sparse, correlated, time- 
varying signals from sub-Nyquist, non-adaptive, linear measurements is explored from a Bayesian per- 
spective. While there has been a handful of previously proposed Bayesian dynamic CS algorithms in the 
literature, the ability to perform inference on high-dimensional problems in a computationally efficient 
manner remains elusive. In response, we propose a probabilistic dynamic CS signal model that captures 
both amplitude and support correlation structure, and describe an approximate message passing algorithm 
that performs soft signal estimation and support detection with a computational complexity that is linear 
in all problem dimensions. The algorithm, DCS-AMP, can perform either causal filtering or non-causal 
smoothing, and is capable of learning model parameters adaptively from the data through an expectation- 
maximization learning procedure. We provide numerical evidence that DCS-AMP performs within 3 dB 
of oracle bounds on synthetic data under a variety of operating conditions. We further describe the result 
of applying DCS-AMP to two real dynamic CS datasets, as well as a frequency estimation task, to bolster 
our claim that DCS-AMP is capable of offering state-of-the-art performance and speed on real-world 
high-dimensional problems. 

I. Introduction 

In this work, we consider the dynamic compressive sensing (dynamic CS) problem, in which a 
sparse, vector-valued time series is recovered from a second time series of noisy, sub-Nyquist, linear 
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measurements. Such a problem finds application in, e.g., dynamic MRI [2], high-speed video capture IS, 
and underwater channel estimation |@]. 

Framed mathematically, the objective of the dynamic CS problem is to recover the time series 
{x^^\ . . . ,x^'^^}, where x^*) G is the signal at timestep t, from a time series of measurements, 
{y^^\ . . . ,y^^^}. Each € C*^ is obtained from the linear measurement process, 

yW=AWa;W + eW, t = l,...,T, (1) 

with e(*) representing corrupting noise. The measurement matrix A^*) (which may be time-varying or 
time-invariant, i.e., A*-*^ = A V t) is known in advance, and is generally wide, leading to an underdeter- 
mined system of equations. The problem is regularized by assuming that a;*^*^ is sparse (or compressible)]^ 
having relatively few non-zero (or large) entries. 

In many real-world scenarios, the underlying time-varying sparse signal exhibits substantial temporal 
correlation. This temporal correlation may manifest itself in two interrelated ways: (i) the support of the 
signal may change slowly over time ||2l, ||3l, ||5l-||8l, and (ii) the amplitudes of the large coefficients 
may vary smoothly in time. 

In such scenarios, incorporating an appropriate model of temporal structure into a recovery technique 
makes it possible to drastically outperform structure-agnostic CS algorithms. From an analytical stand- 
point, Vaswani and Lu demonstrate that the restricted isometry property (RIP) sufficient conditions for 
perfect recovery in the dynamic CS problem are significantly weaker than those found in the traditional 
single measurement vector (SMV) CS problem when accounting for the additional structure In 
this work, we take a Bayesian approach to modeling this structure, which contrasts those dynamic 
CS algorithms inspired by convex relaxation, such as the Dynamic LASSO iH and the Modified-CS 
algorithm Q. Our Bayesian framework is also distinct from those hybrid techniques that blend elements 
of Bayesian dynamical models like the Kalman filter with more traditional CS approaches of exploiting 
sparsity through convex relaxation ||2], lITOl or greedy methods ifTTI . 

In particular, we propose a probabilistic model that treats the time-varying signal support as a set of 
independent binary Markov processes and the time-varying coefficient amplitudes as a set of independent 
Gauss-Markov processes. As detailed in Section JIl this model leads to coefficient marginal distributions 
that are Bernoulli-Gaussian (i.e., "spike-and-slab"). Later, in Section |V] we describe a generalization of 
the aforementioned model that yields Bemoulli-Gaussian-mixture coefficient marginals with an arbitrary 

'without loss of generality, we assume aj'*^ is sparse/compressible in the canonical basis. Other sparsifying bases can be 
incorporated into the measurement matrix A'*' without changing our model. 
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number of mixture components. The models that we propose thus differ substantially from those used in 
other Bayesian approaches to dynamic CS, |[T2l and |[T3l . In particular, Sejdinovic et al. |[T2l combine a 
linear Gaussian dynamical system model with a sparsity-promoting Gaussian-scale-mixture prior, while 
Shahrasbi et al. [il3] employ a particular spike-and-slab Markov model that couples amplitude evolution 
together with support evolution. 

Our inference method also differs from those used in the alternative Bayesian dynamic CS algorithms 
|[T2l and 1131 . In 11121 . Sejdinovic et al. perform inference via a sequential Monte Carlo sampler [14|. 
Sequential Monte Carlo techniques are appealing for their applicability to complicated non-linear, non- 
Gaussian inference tasks like the Bayesian dynamic CS problem. Nevertheless, there are a number of 
important practical issues related to selection of the importance distribution, choice of the resampling 
method, and the number of sample points to track, since in principle one must increase the number of 
points exponentially over time to combat degeneracy liT4l . Additionally, Monte Carlo techniques can be 
computationally expensive in high-dimensional inference problems. An alternative inference procedure 
that has recently proven successful in a number of applications is loopy belief propagation (LBP) |15|. 
In |[T3l . Shahrasbi et al. extend the conventional LBP method proposed in 111 61 for standard CS under a 
sparse measurement matrix A to the case of dynamic CS under sparse A^^\ Nevertheless, the confinement 
to sparse measurement matrices is very restrictive, and, without this restriction, the methods of ||T3l . ||T6l 
become computationally intractable. 

Our inference procedure is based on the recently proposed framework of approximate message passing 
(AMP) ifTTll . and in particular its "turbo" extension |[T8l . AMP, an unconventional form of LBP, was 
originally proposed for standard CS with a dense measurement matrix ||lT'|, and its noteworthy properties 
include: (i) a rigorous analysis (as M, — )• oo with M/N fixed, under i.i.d. sub-Gaussian A) establishing 
that its solutions are governed by a state-evolution whose fixed points are optimal in several respects [19 1, 
and (ii) extremely fast runtimes (as a consequence of the fact that it needs relatively few iterations, each 
requiring only one multiplication by A and its transpose). The turbo-AMP framework originally proposed 
in ifTSl offers a way to extend AMP to structured-sparsity problems such as compressive imaging |[20l . 
joint communication channel/symbol estimation |[2TI . and — as we shall see in this work — the dynamic 
CS problem. 

Our work makes several contributions to the existing Uterature on dynamic CS. First and foremost, the 
DCS-AMP algorithm that we develop offers an unrivaled combination of speed (e.g., its computational 
complexity grows only linearly in the problem dimensions M, N, and T) and reconstruction accuracy, as 
we demonstrate on both synthetic and real-world signals. Ours is the first work to exploit the speed and 
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accuracy of loopy belief propagation (and, in particular, AMP) in the dynamic CS setting, accomplished 
by embedding AMP within a larger Bayesian inference algorithm. Second, we propose an expectation- 
maximization 1221 procedure to automatically learn the parameters of our statistical model, as described 
in Section |IVl avoiding a potentially complicated "tuning" problem. The ability to automatically calibrate 
algorithm parameters is especially important when working with real-world data, but is not provided by 
many of the existing dynamic CS algorithms (e.g., ||2], ||5], ll9l- |[T2l ). In addition, our learned model 
parameters provide a convenient and interpretable characterization of time-varying signals in a way 
that, e.g., Lagrange multipliers do not. Third, DCS-AMP provides a unified means of performing both 
filtering, where estimates are obtained sequentially using only past observations, and smoothing, where 
each estimate enjoys the knowledge of past, current, and future observations. In contrast, the existing 
dynamic CS schemes can support either filtering, or smoothing, but not both. 

A. Notation 

Boldfaced lower-case letters, e.g., a, denote column vectors, while boldfaced upper-case letters, e.g., 
A, denote matrices. The letter t is strictly used to index a timestep, t = 1,2, ... ,T, the letter n is 
strictly used to index the coefficients of a signal, n = I, . . . , N, and the letter m is strictly used to index 
the measurements, m = 1, . . . , M. The superscript indicates a timestep-dependent quantity, while a 
superscript without parentheses, such as ^, indicates a quantity whose value changes according to some 
algorithmic iteration index k. Subscript notations such as Xn^ are used to denote the n*'^ element of the 
vector while set subscript notation, e.g., Xg\ denotes the sub-vector of x^^^ consisting of indices 
contained in S. The m^^ row of the matrix A is denoted by aj^, an M-by-M identity matrix is denoted 
by Im, and a length- vector of ones is given by l^r. Finally, CM{a;b,C) refers to the circularly 
symmetric complex normal distribution that is a function of the vector a, with mean h and covariance 
matrix C. 

II. Signal Model 

We assume that the measurement process can be accurately described by the linear model of ([1}. We 
further assume that A*-*-* G C^^^^, t = 1, . . . , T, are measurement matrices known in advance, whose 
columns have been scaled to be of unit normj^ We model the noise as a stationary, circularly symmetric, 
additive white Gaussian noise (AWGN) process, with e^*) ^ 0^(0, a"^! m) V t. 

^Our algorithm can be generalized to support A^*' without equal-norm columns, a time-varying number of measurements, 
M'*', and real- valued matrices/signals as well. 
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As noted in Section Jl the sparse time series, {a;^*)}^^, often exhibits a high degree of correlation 
from one timestep to the next. In this work, we model this correlation through a slow time-variation 
of the signal support, and a smooth evolution of the amplitudes of the non-zero coefficients. To do so, 
we introduce two hidden random processes, and The binary vector G {0,1}^ 

describes the support of x^-^\ denoted S^*\ while the vector 0^*^ G describes the amplitudes of the 
active elements of a;*^*^. Together, s^*) and 0^*^ completely characterize a;^*) as follows: 

4^=4*^-# Vn,t. (2) 

Therefore, Sn^ = sets Xn^ = and n ^ while Sn^ = 1 sets Xn^ = 9n^ and n G 5^*^. 

To model slow changes in the support 5^*^ over time, we model the n*'^ coefficient's support across 
time, {sn^}J^i, as a Markov chain defined by two transition probabilities: =Pr{sn^= l\sn ^^=0}, 
and = Pr{sn'* = 0|sn = 1}, and employ independent chains across n = 1, . . . ,N . We further 
assume that each Markov chain operates in steady-state, such that Pr{sn^ = 1} = A Vn, t. This steady- 
state assumption implies that these Markov chains are completely specified by the parameters A and , 
which together determine the remaining transition probability = Apoj/(l — A). Depending on how 
is chosen, the prior distribution can favor signals that exhibit a nearly static support across time, or 
it can allow for signal supports that change substantially from timestep to timestep. For example, it can 
be shown that l/p,,! specifies the average run length of a sequence of ones in the Markov chains. 

The second form of temporal structure that we capture in our signal model is the correlation in active 
coefficient amplitudes across time. We model this correlation through independent stationary steady-state 
Gauss-Markov processes for each n, wherein {9n^}J^i evolves in time according to 

# = (!-«) [dt^^ - C ) + aw^J:^ + C, (3) 

where C G C is the mean of the process, Wn^ ~ CJ\f{0, p) is an i.i.d. circular white Gaussian perturbation, 
and a G [0, 1] controls the temporal correlation. At one extreme, a = 0, the amplitudes are totally 
correlated, (i.e., ^i*^ = On ^^), while at the other extreme, a = 1, the amplitudes evolve according to an 
uncorrelated Gaussian random process with mean Q. 

At this point, we would like to make a few remarks about our signal model. First, due to dD, it is 
clear that p{x^n\s^n,dn^') = 5{x£^ — Sn^On^), where 6{-) is the Dirac delta function. By marginalizing 
out Sn^ and 9n \ one finds that 

= (1 - A)5(xW) + ACA/-(xW;C,a2), (4) 
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where cj^ = is the steady-state variance of 6n\ Equation (|4l) is a BemouUi-Gaussian or "spike- 
and-slab" distribution, which is an effective sparsity-promoting prior due to the point-mass at = 0. 
Second, we observe that the amplitude random process, {0^*^ }?Li> evolves independently from the sparsity 
pattern random process, {s^*)}^^. As a result of this modeling choice, there can be significant hidden 
amplitudes 6n^ associated with inactive coefficients (those for which Sn^ = 0). Consequently, 6n^ should 
be viewed as the amplitude of Xn^ conditioned on Sn'^ = 1. Lastly, we note that higher-order Markov 
processes and/or more complex coefficient marginals could be considered within the framework we 
propose, however, to keep development simple, we restrict our attention to first-order Markov processes 
and Bernoulli-Gaussian marginals until Section jV] where we describe an extension of the above signal 
model that yields BernouUi-Gaussian-mixture marginals. 

III. The DCS-AMP Algorithm 

In this section we will describe the DCS-AMP algorithm, which efficiently and accurately estimates the 
marginal posterior distributions of {xn^}, {6n^}, and {sn^} from the observed measurements {y*-*^}^!, 
thus enabling both soft estimation and soft support detection. The use of soft support information is 
particularly advantageous, as it means that the algorithm need never make a firm (and possibly erroneous) 
decision about the support that can propagate errors across many timesteps. As mentioned in Section Jl 
DCS -AMP can perform either filtering or smoothing. 

The algorithm we develop is designed to exploit the statistical structure inherent in our signal model. 
By defining y to be the collection of all measurements, {y^^^}J=i (and defining x, s, and 6 similarly), 
the posterior joint distribution of the signal, support, and amplitude time series, given the measurement 
time series, can be expressed using B ayes' rule as 

T / M N \ 

p{x,s,e\y) « n n Ilp(^n\s^\0^^)p{s';iYt'^)p{e^^\9t'^) , (5) 

t=l \m=l n=l / 

where oc indicates proportionality up to a constant scale factor, p{sn^\sn^) = p{sn^), and p{9n^\6n^) = 
p{6n'^). By inspecting we see that the posterior joint distribution decomposes into the product of 
many distributions that only depend on small subsets of variables. A graphical representation of such 
decompositions is given by the factor graph, which is an undirected bipartite graph that connects the pdf 
"factors" of ^ with the random variables that constitute their arguments ||23l . In Table H we introduce 
the notation that we will use for the factors of our signal model, showing the correspondence between 
the factor labels and the underlying distributions they represent, as well as the specific functional form 
assumed by each factor. The associated factor graph for the posterior joint distribution of 1^ is shown 
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TABLE I: The factors, underlying distributions, and functional forms associated with our signal model 
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Fig. 1: Factor graph representation of the joint posterior distribution of (|5}. 



in Fig. [H labeled according to Table |T] Filled squares represent factors, while circles represent random 
variables. 

As seen in Fig. [T] all of the variables needed at a given timestep can be visualized as lying in a plane, 
with successive planes stacked one after another in time. We will refer to these planes as "frames". The 
temporal correlation of the signal supports is illustrated by the hn^ factor nodes that connect the s^n 
variable nodes between neighboring frames. Likewise, the temporal correlation of the signal amplitudes 
is expressed by the interconnection of Sn factor nodes and 9n^ variable nodes. For visual clarity, these 
factor nodes have been omitted from the middle portion of the factor graph, appearing only at indices 
n = 1 and n = N, but should in fact be present for all indices n = 1, . . . ,N. Since the measurements 



{Vm} are observed variables, they have been incorporated into the gm factor nodes. 
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The algorithm that we develop can be viewed as an approximate implementation of belief propagation 
(BP) [241, a message passing algorithm for performing inference on factor graphs that describe proba- 
bilistic models. When the factor graph is cycle-free, belief propagation is equivalent to the more general 
sum-product algorithm [23], which is a means of computing the marginal functions that result from 
summing (or integrating) a multivariate function over all possible input arguments, with one argument 
held fixed, (i.e., marginahzing out all but one variable). In the context of BP, these marginal functions are 
the marginal distributions of random variables. Thus, given measurements y and the factorization of the 
posterior joint distribution p{x,s,6\i)), DCS-AMP computes (approximate) posterior marginals of x^n , 
Sn\ and 9n^. In "filtering" mode, our algorithm would therefore return, e.g., p(x^n\{y^^^}\=i), while 
in "smoothing" mode it would return p(xn^|{y*^*'*}^]^). From these marginals, one can compute, e.g., 
minimum mean-squared error (MMSE) estimates. The factor graph of Fig. [T contains many short cycles, 
however, and thus the convergence of loopy BP cannot be guaranteed |[23l rl Despite this, loopy BP has 
been shown to perform extremely well in a number of different applications, including turbo decoding 
|[29l . computer vision [30], and compressive sensing |[l6]-|Il8l, 1201, II3T1-II331. 



A. Message scheduling 

In loopy factor graphs, there are a number of ways to schedule, or sequence, the messages that are 
exchanged between nodes. The choice of a schedule can impact not only the rate of convergence of the 
algorithm, but also the likelihood of convergence as well ll34l . We propose a schedule (an evolution of 
the "turbo" schedule proposed in |[T8l ) for DCS-AMP that is straightforward to implement, suitable for 
both filtering and smoothing applications, and empirically yields quickly converging estimates under a 
variety of diverse operating conditions. 

Our proposed schedule can be broken down into four distinct steps, which we will refer to using 
the mnemonics (into), (within), (out), and (across). At a particular timestep t, the (into) step involves 
passing messages that provide current beliefs about the state of the relevant support variables, {s^n}n=i^ 
and amplitude variables, {On^}!^^^, laterally into the dashed AMP box within frame t. (Recall Fig. [T]) 
The (within) step makes use of these incoming messages, together with the observations available in 
that frame, {y^ }m=i' exchange messages within the dashed AMP box of frame t, thus generating 
estimates of the marginal posteriors of the signal variables {x^n}n=i- Using these posterior estimates. 



'However, it is worth noting that in the past decade much work has been accomplished in identifying specific situations under 
which loopy BP is guaranteed to converge, e.g., 1191 . 1251 - 11281 . 
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the (out) step propagates messages out of the dashed AMP box, providing updated beliefs about the state 
of {sn^}^^i and {On^}^^^. Lastly, the (across) step involves transmitting messages across neighboring 
frames, using the updated beliefs about {sn^}^^-^ and {On^}^^^ to influence the beliefs about {sn^^^}n=i 
and {et'^}^^, (or {st'^}^^, and {9^'^}^^^)- 

The procedures for filtering and smoothing both start in the same way. At the initial t = 1 frame, steps 
(into), (within) and (out) are performed in succession. Next, step (across) is performed to pass messages 
from {sn^}^^i and {On^}^^^ to {sn^}^^i and {On^}^^^. Then at frame t = 2 the same set of steps are 
executed, concluding with messages propagating to {sn^}^^-^ and {On^}^^-^^. This process continues until 
steps (into), (within) and (out) have been completed at the terminal frame, T. At this point, DCS-AMP 
has completed what we call a single forward pass. If the objective was to perform filtering, DCS- 
AMP terminates at this point, since only causal measurements have been used to estimate the marginal 
posteriors. If instead the objective is to obtain smoothed, non-causal estimates, then information begins 
to propagate backwards in time, i.e., step (across) moves messages from {s^^}^^i and {9^^}^^i to 
{srr^''}^=i and {^i^^^^l^^p Steps (into), (within), (out), and (across) are performed at frame T — 1, 
with messages bound for frame T — 2. This continues until the initial frame is reached. At this point 
DCS-AMP has completed what we term as a single forward/backward pass. Multiple such passes, indexed 
by the variable k, can be carried out until a convergence criterion is met or a maximum number of passes 
has been performed. 

B. Implementing the message passes 

We now provide some additional details as to how the above four steps are implemented. To aid our 
discussion, in Fig. [2] we summarize the form of the messages that pass between the various factor graph 
nodes, focusing primarily on a single coefficient index n at an intermediate frame t. Directed edges 
indicate the direction that messages are moving. In the (across) phase, we only illustrate the messages 
involved in a forward pass for the amplitude variables, and leave out a graphic for the corresponding 
backward pass, as well as graphics for the support variable (across) phase. Note that, to be applicable at 
frame T, the factor node dn^^^ and its associated edge should be removed. The figure also introduces 
the notation that we adopt for the different variables that serve to parameterize the messages. We use the 
notation Ua^b{') to denote a message passing from node a to a connected node b. For Bernoulli message 
pdfs, we show only the non-zero probability, e.g., A^^ = i^i^(t)^^(t){sn'^ = 1). 

To perform step (into), the messages from the factors hn'' and hn^^^ to Sn^ are used to set 7f^*\ 
the message from Sn^ to fn \ Likewise, the messages from the factors dn^ and dn'^^^ to 9n^ are used 
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Fig. 2: A summary of the four message passing phases, including message notation and form. See the pseudocode of Table |ll] 

for the precise message update computations. 

to determine the message from 9n^ to fn\ When performing filtering, or the first forward pass of 
smoothing, no meaningful information should be conveyed from the hn^^^ and dn^^^ factors. This can 
be accomplished by initializing (A^\ k^*-*) with the values (^,0,oo). 

In step (within), messages must be exchanged between the {^n^}^^^ and { gm nodes. When A*^*^ 
is not a sparse matrix, this will imply a dense network of connections between these nodes. Consequently, 
the standard sum-product algorithm would require us to evaluate multi-dimensional integrals of non- 
Gaussian messages that grow exponentially in number in both N and M. This approach is clearly 
infeasible for problems of any appreciable size, and thus we turn to a simplification known as approximate 
message passing (AMP) II17II . Bll . 

At a high-level, AMP can be viewed as a simplification of loopy BP, employing central limit theorem 
arguments to approximate the sum of many non-Gaussian random variables as a Gaussian. Through 
a series of principled approximation steps (which become exact for sub-Gaussian A matrices in the 
large-system limit |[T9l ). AMP produces an iterative thresholding algorithm that requires only 0{MN) 
operations, dominated by matrix- vector products, to obtain posteriors on the {x^n}^_-^ variable nodes. 
The specifics of the iterative thresholding algorithm will depend on the signal prior under which AMP 
is operating IIBTI . but it is assumed that the joint prior decouples into independent (but not necessarily 
i.i.d.) priors on each coefficient Xn^. See Appendix lAl for additional background on AMP. 
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By viewing I'fW (*)(•) as a "local prior'l^ for x^n , we can readily apply an off-the-shelf AMP 
algorithm (e.g., IIBTI . ll35l . ll36l ) as a means of performing the message passes within the portions of 
the factor graph enclosed within the dashed boxes of Fig. [T] (only one such box is visible). The use of 
AMP with decoupled local priors within a larger message passing algorithm that accounts for statistical 
dependencies between signal coefficients was first proposed in |18|, and further studied in 11201 . lIlTI . 
|[32l . ll37l . |[38l . Here, we exploit this powerful "turbo" inference approach to account for the strong 
temporal dependencies inherent in the dynamic CS problem. 

The local prior for our signal model is a Bernoulli-Gaussian, namely 

The appropriate AMP message update equations for this local prior follow a straightforward extension 
of the derivations outlined in lITSl . which considered the special case of a zero-mean Bernoulli-Gaussian 
prior. The specific AMP updates for our model are given by (IA4) -( |A8l ) in Table HIl 

After employing AMP to manage the message passing between the {a;n^}^_^ and {^m }^_^ nodes 
in step (within), messages must be propagated out of the dashed AMP box of frame t (step (out)) and 
either forward or backward in time (step (across)). While step (across) simply requires a straightforward 
application of the sum-product message computation rules, step (out) imposes several difficulties which 
we must address. For the remainder of this discussion, we focus on a novel approximation scheme 
for specifying the message i^j(t)^g(t) (•)■ Our objective is to arrive at a message approximation that 
introduces negligible error while still leading to a computationally efficient algorithm. A Gaussian message 
approximation is a natural choice, given the marginally Gaussian distribution of 9n^. As we shall soon 
see, it is also a highly justifiable choice. 

A routine appUcation of the sum-product rules to the /„* -to-^n ■* message would produce the following 
expression: 

^';^\e^M^) - (1 -4*^)CAr(O;0^„c^) +iWCAr(0W;0t.t,cj). (6) 

Unfortunately, the term CA/'(0; 0L, c!) prevents us from normalizing z^'^'Jff' ,t. (On^), because it is constant 
with respect to 9n \ Therefore, the distribution on 6n^ represented by (|6]l is improper. To provide intuition 
into why this is the case, it is helpful to think of ly An {On^ ) as a message that conveys information 



''The AMP algorithm is conventionally run with static, i.i.d. priors for each signal coefficient. When utilized as a sub-component 
of a larger message passing algorithm on an expanded factor graph, the signal priors (from AMP's perspective) will be changing 
in response to messages from the rest of the factor graph. We refer to these changing AMP priors as local priors. 
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about the value of 9n^ based on the values of and Sn\ If Sn^ = 0, then by Xn^ = 0, thus making 
On'' unobservable. The constant term in (|6]l reflects the uncertainty due to this unobservability through 
an infinitely broad, uninformative distribution for 9n\ 

To avoid an improper pdf, we modify how this message is derived by regarding our assumed signal 
model, in which Sn^ € {0, 1}, as a limiting case of the model with G {e, 1} as e — > 0. For any fixed 
positive e, the resulting message i^jrw^gwi-) proper, given by 



where 

1 — vr) + e"^7r 



The pdf in ^ is that of a binary Gaussian mixture. If we consider e <^ 1, the first mixture component 
is extremely broad, while the second is more "informative," with mean (pl^ and variance c^. The relative 
weight assigned to each component Gaussian is determined by the term Q{TTn^). Notice that the limit of 
this weighting term is the simple indicator function 



lim 17 (vr) 



if0<7r<l, 

(9) 

1 if vr = 1. 



Since we cannot set e = 0, we instead fix a small positive value, e.g., e = 10~^. In this case, ^ 
could then be used as the outgoing message. However, this presents a further difficulty: propagating 
a binary Gaussian mixture forward in time would lead to an exponential growth in the number of 
mixture components at subsequent timesteps. This difficulty is a familiar one in the context of switched 
linear dynamical systems based on conditional Gaussian models, since such models are not closed under 
marginalization |[39l . To avoid the exponential growth in the number of mixture components, we collapse 
our binary Gaussian mixture to a single Gaussian component, an approach sometimes referred to as a 
Gaussian sum approximation POl . PTl . This can be justified by the fact that, for e <^ 1, behaves 
nearly like the indicator function in (|9]l, in which case one of the two Gaussian components will typically 
have negligible mass. 

To carry out the Gaussian sum approximation, we propose the following two schemes. The first is to 
simply choose a threshold r that is slightly smaller than 1 and, using (|9]l as a guide, threshold vf^*^ to 
choose between the two Gaussian components of The resultant message is thus 

^^;«^e<')(^i*^) = CA/-(0W;^i*\ V^l*^), (10) 
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with and ipn chosen according to 




< r 



> r 



(11) 



The second approach is to perform a second-order Taylor series approximation of —loguJ°f^^^t^{9n ) 
with respect to 9n\ The resultant quadratic form in On '^ can be viewed as the logarithm of a Gaussian 
kernel with a particular mean and variance, which can be used to parameterize a single Gaussian message, 
as described in ll32l . The latter approach has the advantage of being parameter-free. Empirically, we find 
that this latter approach works well when changes in the support occur infrequently, e.g., p^^ < 0.025, 
while the former approach is better suited to more dynamic environments. 

In Table HIl we provide a pseudo-code implementation of our proposed DCS-AMP algorithm that gives 
the explicit message update equations appropriate for performing a single forward pass. The interested 
reader can find an expanded derivation of the messages in [42 1. The primary computational burden of 
DCS-AMP is computing the messages passing between the {xn^} and {gm} nodes, a task which can 
be performed efficiently using matrix- vector products involving A*^*^ and A*^*^". The resulting overall 
complexity of DCS -AMP is therefore 0{TMN) flops (flops-per-pass) when filtering (smoothing) |f| The 
storage requirements are 0{N) and 0{TN) complex numbers when filtering and smoothing, respectively. 



The signal model of Section JI] is specified by the Markov chain parameters A, p^-^, the Gauss-Markov 
parameters C,, a, p, and the AWGN variance dg. It is likely that some or all of these parameters will require 
tuning in order to best match the unknown signal. To this end, we develop an expectation-maximization 
(EM) 1^ algorithm that works together with the message passing procedure described in Section IIII-AI 
to learn all of the model parameters in an iterative fashion from the data. 

The EM algorithm is appealing for two principal reasons. First, the EM algorithm is a well-studied and 
principled means of parameter estimation. At every EM iteration, the likelihood is guaranteed to increase 
until convergence to a local maximum occurs P3l . For multimodal Ukelihoods, local maxima will, in 
general, not coincide with the global maximum, but a judicious initialization of parameters can help in 
ensuring the EM algorithm reaches the global maximum [43 1. The second appealing feature of the EM 

^When they exist, fast implicit A'*' operators can provide significant computational savings in higli-dimensional problems. 
Implementing a Fourier transform as a fast Fourier transform (FFT) subroutine, for example, would drop DCS-AMP's complexity 
from OiTMN) to 0{TN log^ N). 



IV. Learning the signal model parameters 
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% Define soft-thresholding functions: 

G„t(</.; c) ^ (1 + 7„t (</,; c))-i ( + 7„t (0; c)|F„i (<^; c)|' 



c)^^F„t(</.,c) = lG„t(</.;c) 



X cxp I — 



]) 



(Dl) 

(D2) 
(D3) 

(D4) 



% Begin passing messages . . . 
for < = 1, . . . , T : 

% Execute the (into) phase . 



(1-A„ )-(l-A„ ) + A^ .A„ 



Vn 



-(t) 



Vn 



Vn 



% Initialize AMP-related variables . 



y)i>,\/n : = 0, and = 100 ■ En=i V' 



(t) 



Vm : 2^ 

% Execute the (within) phase using AMP . . . 
for i = 1, . . . , /, : 

4>l.t = E^=i^^^'n*^^„t + M;t Vn 
= F,,tWu;cl) Vn 
G„t(0^,,;cj) Vn 

"e Af Z^n=l %t 



end 



Vn 



% Store current estimate of x 



% Execute the (out) phase . . . 



(Al) 
(A2) 
(A3) 



Vn 



), o.w. 
% Execute the (across) phase forward in time . . . 

_ Pio(l->^k")(l-^^") + (l-Poi)Ak"^^" w 



Vn (e < 1) 



Vn 



(A4) 
(A5) 
(A6) 
(A7) 
(A8) 

(A9) 

(AlO) 

(All) 

(A12) 
(A13) 
(A14) 



end 



TABLE II: DCS-AMP steps for filtering mode, or the forward portion of a single forward/baclcward pass in smoothing mode. 
See Fig. [2] to associate quantities with the messages traversing the factor graph. 



algorithm lies in the fact that its expectation step leverages quantities that have already been computed 
in the process of executing DCS -AMP, making the EM procedure computationally efficient. 

We let r = {A,P(,^, C, a, p, fjg} denote the set of all model parameters, and let denote the set 
of parameter estimates at the k^^ EM iteration. The objective of the EM procedure is to find parameter 
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estimates that maximize the data likelihood p{y\T). Since it is often computationally intractable to perform 
this maximization, the EM algorithm incorporates additional "hidden" data and iterates between two steps: 
( i) evaluating the conditional expectation of the log likelihood of the hidden data given the observed data, 
y, and the current estimates of the parameters, T^, and (ii) maximizing this expected log likelihood with 
respect to the model parameters. For all parameters except the noise variance, Ug, we use s and 6 as the 
hidden data, while for fig we use x. 

Before running DCS-AMP, the model parameters are initialized using any available prior knowledge. 
If operating in smoothing mode, DCS-AMP performs an initial forward/backward pass, as described 
in Section IIII-AI Upon completing this first pass, estimates of the marginal posterior distributions are 
available for each of the underlying random variables. Additionally, belief propagation can provide 
pairwise joint posterior distributions, e.g., p(^Sn^ , Sn \y) , for any variable nodes connected by a common 
factor node [44|. With these marginal, and pairwise joint, posterior distributions, it is possible to produce 
closed-form solutions for performing steps (i) and (ii) above. We adopt a Gauss-Seidel scheme, performing 
coordinate-wise maximization, e.g., 

logp(j/,8,0;A,r^{A'=})|j/;r'=" . 

The EM procedure is performed after each forward/backward pass, leading to a convergent sequence of 
parameter estimates. If operating in filtering mode, the procedure is similar, however the EM procedure 
is run after each recovered timestep using only causally available posterior estimates. 

In Table |llll we provide the EM update equations for each of the parameters of our signal model, 
assuming DCS-AMP is operating in smoothing mode. A complete derivation of each update can be 
found in Il42t . 

V. Incorporating Additional Structure 

In Sections |ll] - HY] we described a signal model for the dynamic CS problem and summarized a 
message passing algorithm for making inferences under this model, while iteratively learning the model 
parameters via EM. We also hinted that the model could be generalized to incorporate additional, or 
more complex, forms of structure. In this section we will elaborate on this idea, and illustrate one such 
generalization. 

Recall that, in Section |II1 we introduced hidden variables s and 6 in order to characterize the structure 
in the signal coefficients. An important consequence of introducing these hidden variables was that they 
made each signal coefficient Xn^ conditionally independent of the remaining coefficients in x, given Sn^ 



X^~^^ = argmaxE^ g 
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%> Define key quantities obtained from AIVIP-IVIIVIV at iteration k: 



,-,(*) 



(t) jt-i) 



1 



Hn — b-lfn \y\ — 'I 



l*')(l-Ak")) 


(Ql) 


= l|y) 


(Q2) 




(Q3) 




(Q4) 



^ var{a;L*^ |y} % See (A6) of Table|ll] 
^ti*' = E [zl'' I y] % See (A5) of Table M 



% ElVl update equations: 



(t-i)L-.l 



2ifc Z^n = l P' 



(1) 



+EL2E^=i^(/ii*'-(i-«'=)^r^')) 

i_^(b_^b^ + 8jv(r-i)c) 



4JV(T 
where: 

f^^EL2E^=i5He{£[0i"*e,ri'|g]} 
A' 



(El) 
(E2) 

(E3) 
(E4) 



■ — „fc Z^t=2Z^n = l "1 



(c«'=)2]V(T-l) 



{t)*fl{t-l)|_^^^ 



-,(i)|2 



+(„fe)2|^ft|2 _2(i_„fe)9^e{i5[e£*)*9£*-i)|y]} 
-2a'=?Rc{/ii*^*C'=} + 2a*=(l - a'=)lHc{AL'"^'*C''} 
+(l-a'=)(Cr^) + |Ar''P) 



(E5) 
(E6) 



TABLE III: EM update equations for the signal model parameters of Section HH 

and 9n^. This conditional independence served an important algorithmic purpose since it allowed us to 
apply the AMP algorithm, which requires independent local priors, within our larger inference procedure. 

One way to incorporate additional structure into the signal model of Section JI] is to generalize our 
choices of p{s) and p{6). As a concrete example, pairing the temporal support model proposed in this 
work with the Markovian model of wavelet tree inter-scale correlations described in ||20l through a 
more complex support prior, p{s), could enable even greater undersampling in a dynamic MRI setting. 
Performing inference on such models could be accomplished through the general algorithmic framework 
proposed in ||38l . As another example, suppose that we wish to expand our Bernoulli-Gaussian signal 
model to one in which signal coefficients are marginally distributed according to a Bernoulli-Gaussian- 
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mixture, i.e., 

D 



d=l 

where Ylid=o ~ ^- Since we still wish to preserve the slow time-variations in the support and 
smooth evolution of non-zero amplitudes, a natural choice of hidden variables is {s, 6i, . . . , Od}, where 
Sn^ G {0, 1, . . . , D}, and € C, d = 1, . . . , D. The relationship between and the hidden variables 
then generalizes to: 



To model the slowly changing support, we specify p{s) using a {D + l)-state Markov chain defined 
by the transition probabilities pod — Prj^n^ = 0\sn = d} and p^o — Pr{sn^ = d\sn = 0}, d = 
1, . . . , In this work, we assume that state transitions cannot occur between active mixture components, 
i.e., Pr(s^i*'' = d\sn = e) = when d 7^ e / oj^ For each amplitude time-series we again use 
independent Gauss-Markov processes to model smooth evolutions in the amplitudes of active signal 
coefficients, i.e., 

where u;i*^~CA/'(0,prf). 

As a consequence of this generalized signal model, a number of the message computations of Sec- 
tion IIII-BI must be modified. For steps (into) and (across), it is largely straightforward to extend the 
computations to account for the additional hidden variables. For step (within), the modifications will 
affect the AMP thresholding equations defined in (ID II) - (ID4I ) of Table |lll Details on a Bemoulli-Gaussian- 
mixture AMP algorithm can be found in ll33l . For the (out) step, we will encounter difficulties applying 
standard sum-product update rules to compute the messages {i^ j^w (OI^Li- the Bernoulli- 

Gaussian case, we consider a modification of our assumed signal model that incorporates an e <C 1 
term, and use Taylor series approximations of the resultant messages to collapse a {D + l)-ary Gaussian 
mixture to a single Gaussian. More information on this procedure can be found in Il42l . 



*By relaxing this restriction on active-to-active state transitions, we can model signals whose coefficients tend to enter the 
support set at small amplitudes that grow larger over time through the use of a Gaussian mixture component with a small 
variance that has a high probability of transitioning to a higher variance mixture component. 



18 



VI. Empirical Study 

We now describe the results of an empirical study of DCS- AMP The primary performance metric that 
we used in all of our experiments, which we refer to as the time-averaged normalized MSE (TNMSE), 
is defined as 



where x^^^ is an estimate of x^^h 

Unless otherwise noted, the following settings were used for DCS-AMP in our experiments. First, 
DCS-AMP was run as a smoother, with a total of 5 forward/backward passes. The number of inner 
AMP iterations I for each (within) step was I = 25, with a possibility for early termination if the 
change in the estimated signal, /xj, fell below a predefined threshold from one iteration to the next, i.e., 
jfWul — /xj~^||2 < 10^^. Equation ( IA9I ) of Table JI] was used to produce x^^\ which corresponds to an 
MMSE estimate of x^^^ under DCS-AMP's estimated posteriors p{xn^\y). The amplitude approximation 
parameter e from (|7]l was set to e = 10^'', while the threshold r from ([TTT i was set to r = 0.99. In our 
experiments, we found DCS-AMP's performance to be relatively insensitive to the value of e provided 
that e ^ 1. The choice of r should be selected to provide a balance between allowing DCS-AMP to track 
amplitude evolutions on signals with rapidly varying supports and preventing DCS -AMP from prematurely 
gaining too much confidence in its estimate of the support. We found that the choice r = 0.99 works 
well over a broad range of problems. When the estimated transition probability p^^ < 0.025, DCS-AMP 
automatically switched from the threshold method to the Taylor series method of computing ([TOl i. which 
is advantageous because it is parameter-free. 

When learning model parameters adaptively from the data using the EM updates of Table |llll it 
is necessary to first initialize the parameters at reasonable values. Unless domain-specific knowledge 
suggests a particular initialization strategy, we advocate using the following simple heuristics: The initial 
sparsity rate, A^, active mean, active variance, (d^)^, and noise variance, ((Jg)^, can be initialized 
according to the procedure described in [|33i ^Y]^ The Gauss-Markov correlation parameter, a, can be 
initialized as 



For problems with a high degree of undersampling and relatively non-sparse signals, it may be necessary to threshold the 
value for suggested in 1331 so that it does not fall below, e.g., 0.10. 




t=l 




(12) 



'Code for reproducing our results is available at |http://www.ece.osu.edu/~schniter/turboAMPdcS| 
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The active-to-inactive transition probability, p^^ , is difficult to gauge solely from sample statistics involving 
the available measurements, y. We used p^^ = 0.10 as a generic default choice, based on the premise 
that it is easier for DCS-AMP to adjust to more dynamic signals once it has a decent "lock" on the static 
elements of the support, than it is for it to estimate relatively static signals under an assumption of high 
dynamicity. 

A. Performance across the sparsity-undersampling plane 

Two factors that have a significant effect on the performance of any CS algorithm are the sparsity 
I^S*-*^! of the underlying signal, and the number of measurements M. Consequently, much can be learned 
about an algorithm by manipulating these factors and observing the resulting change in performance. To 
this end, we studied DCS-AMP's performance across the sparsity-undersampling plane ||45l , which is 
parameterized by two quantities, the normalized sparsity ratio, /? = E[|5*^*^ |]/M, and the undersampling 
ratio, 6 = M/N . For a given {5, (3) pair (with N fixed at 1500), sample realizations of s, 6, and e were 
drawn from their respective priors, and elements of a time-varying A^*^ were drawn from i.i.d. zero-mean 
complex circular Gaussians, with all columns subsequently scaled to have unit ^2-norm, thus generating 
X and y. 

As a performance benchmark, we used the support-aware Kalman smoother In the case of linear 
dynamical systems with jointly Gaussian signal and observations, the Kalman filter (smoother) is known 
to provide MSE-optimal causal (non-causal) signal estimates H6l . When the signal is Bernoulli-Gaussian, 
the Kalman filter/smoother is no longer optimal. However, a lower bound on the achievable MSE can 
be obtained using the support-aware Kalman filter (SKF) or smoother (SKS). Since the classical state- 
space formulation of the Kalman filter does not easily yield the support-aware bound, we turn to an 
alternative view of Kalman filtering as an instance of message passing on an appropriate factor graph 
BTl . For this, it suffices to use the factor graph of Fig. [T]with {sn^} treated as fixed, known quantities. 
Following the standard sum-product algorithm rules results in a message passing algorithm in which all 
messages are Gaussian, and no message approximations are required. Then, by running loopy Gaussian 
belief propagation until convergence, we are guaranteed that the resultant posterior means constitute the 
MMSE estimate of x [25, Claim 5]. 

To quantify the improvement obtained by exploiting temporal correlation, signal recovery was also 
explored using the Bernoulli-Gaussian AMP algorithm (BG-AMP) independently at each timestep (i.e., 
ignoring temporal structure in the support and amplitudes) , accomplished by passing messages only within 
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Support-aware Kalman smoother TNMSE [dB] DCS-AMP TNMSE [dB] EM-DCS-AMP TNMSE [dB] BG-AMP TNMSE [dB] 




Fig. 3: A plot of the TNMSE (in dB) of (from left) the SKS, DCS-AMP, EM-DCS- AMP, and BG-AMP across the 
sparsity-undersampling plane, for temporal correlation parameters — 0.05 and a = 0.01. 



the dashed boxes of Fig. [T] using p[xn ) from ^ as AMP's priori^ 

In Fig. [3j we present four plots from a representative experiment. The TNMSE across the (loga- 
rithmically scaled) sparsity-undersampling plane is shown for (working from left to right) the SKS, 
DCS-AMP, EM-DCS-AMP (DCS-AMP with EM parameter tuning), and BG-AMP In order to allow 
EM-DCS-AMP ample opportunity to converge to the correct parameter values, it was allowed up to 300 
EM iterations/smoothing passes, although it would quite often terminate much sooner if the parameter 
initializations were reasonably close. The results shown were averaged over more than 300 independent 
trials at each {6, /3) pair. For this experiment, signal model parameters were set at = 1500, T = 25, 
= 0.05, = 0, a = 0.01, (T^ = 1, and a noise variance, cjg, chosen to yield a signal-to-noise 
ratio (SNR) of 25 dB. (M, A) were set based on specific {6, f]) pairs, and p^^ was set so as to keep 
the expected number of active coefficients constant across time. It is interesting to observe that the 
performance of the SKS and (EM-)DCS-AMP are only weakly dependent on the undersampling ratio 6. 
In contrast, the structure-agnostic BG-AMP algorithm is strongly affected. This is one of the principal 
benefits of incorporating temporal structure; it makes it possible to tolerate more substantial amounts of 
undersampling, particularly when the underlying signal is less sparse. 

B. Performance vs p^^ and a 

The temporal correlation of our time-varying sparse signal model is largely dictated by two parameters, 
the support transition probability p^-^ and the amplitude forgetting factor a. Therefore, it is worth inves- 
tigating how the performance of (EM-)DCS-AMP is affected by these two parameters. In an experiment 

'Experiments were also ran that compared performance against Basis Pursuit Denoising (BPDN) 1481 with genie-aided 
parameter tuning (solved using the SPGLl solver f49l). However, this was found to yield higher TNMSE than BG-AMP, and 
at higher computational cost. 



21 



Support-aware Kalman smoother TNMSE [dB] DCS-AMP TNMSE [dB] EM-DCS-AMP TNMSE [dB] 




Fig. 4: TNMSE (in dB) of (from left) the SKS, DCS-AMP, and EM-DCS-AMP as a function of the model parameters p^i and 
Of, for undersampling ratio 5 = 1/3 and sparsity ratio /3 — 0.45. BG-AMP achieved a TNMSE of —5.9 dB across the plane. 

similar to that of Fig. |3l we tracked the performance of (EM-)DCS-AMP, the SKS, and BG-AMP across 
a plane of (p^^ , a) pairs. The active-to-inactive transition probability was swept linearly over the 
range [0, 0.15], while the Gauss-Markov amplitude forgetting factor a was swept logarithmically over the 
range [0.001,0.95]. To help interpret the meaning of these parameters, we note that the fraction of the 
support that is expected to change from one timestep to the next is given by 2pg^, and that the Pearson 
correlation coefficient between temporally adjacent amplitude variables is 1 — a. 

In Fig. H we plot the TNMSE (in dB) of the SKS and (EM-)DCS-AMP as a function of the percentage 
of the support that changes from one timestep to the next (i.e., 2^,,^ x 100) and the logarithmic value of 
a for a signal model in which 6=1/5 and /? = 0.60, with remaining parameters set as before. Since 
BG-AMP is agnostic to temporal correlation, its performance is insensitive to the values of p^^ and a. 
Therefore, we do not include a plot of the performance of BG-AMP, but note that it achieved a TNMSE 
of —5.9 dB across the plane. For the SKS and (EM-)DCS-AMP, we see that performance improves with 
increasing amplitude correlation (moving leftward). This behavior, although intuitive, is in contrast to the 
relationship between performance and correlation found in the MMV problem |[32l . llSOil . primarily due 
to the fact that the measurement matrix is static for all timesteps in the classical MMV problem, whereas 
here it varies with time. Since the SKS has perfect knowledge of the support, its performance is only 
weakly dependent on the rate of support change. DCS-AMP performance shows a modest dependence on 
the rate of support change, but nevertheless is capable of managing rapid temporal changes in support, 
while EM-DCS-AMP performs very near the level of the noise when a < 0.10. 
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C. Recovery of an MRI image sequence 

While the above simulations demonstrate the effectiveness of DCS -AMP in recovering signals generated 
according to our signal model, it remains to be seen whether the signal model itself is suitable for 
describing practical dynamic CS signals. To address this question, we tested the performance of DCS- 
AMP on a dynamic MRI experiment first performed in ||5T1 . The experiment consists of recovering 
a sequence of 10 MRI images of the larynx, each 256 x 256 pixels in dimension. (See Fig. [5]) The 
measurement matrices were never stored explicitly due to the prohibitive sizes involves, but were instead 
treated as the composition of three linear operations, A = MFW~^ . The first operation, W"^ , was 
the synthesis of the underlying image from a sparsifying 2-D, 2-level Daubechies-4 wavelet transform 
representation. The second operation, F, was a 2-D Fourier transform that yielded the k-space coefficients 
of the image. The third operation, M, was a sub-sampling mask that kept only a fraction of the available 
k-space data. 

Since the image transform coefficients are compressible rather than sparse, the SKF/SKS no longer 
serves as an appropriate algorithmic benchmark. Instead, we compare performance against Modified-CS 
|[9l . as well as timestep-independent Basis Pursuit!^ As reported in ||9], Modified-CS demonstrates that 
substantial improvements can be obtained over temporally agnostic methods. 

Since the statistics of wavelet coefficients at different scales are often highly dissimilar (e.g., the 
coarsest-scale approximation coefficients are usually much less sparse than those at finer scales, and are 
also substantially larger in magnitude), we allowed our EM procedure to learn different parameters for 
different wavelet scales. Using Qi to denote the indices of the coarsest-scale "approximation" coefficients, 
and Q2 to denote the finer-scale "wavelet" coefficients, DCS-AMP was initialized with the following 
pai-ameter choices: = 0.99, = 0.01, = 0.01, Csi = CG2 = 0' (^Qi = = 0-05, pg^ = 10^, 
PQ2 = 10^, and cTg = 0.01, and run in filtering mode with / = 10 inner AMP iterations. 

We note that our initializations were deliberately chosen to be agnostic, but reasonable, values. In 
particular, observing that the coarsest-scale approximation coefficients of a wavelet decomposition are 
almost surely non-zero, we initialized the associated group's sparsity rate at Ag^ = 0.99, while the finer 
scale detail coefficients were given an arbitrary sparsity-promoting rate of Xg^ = 0.01. The choices of a 
and p were driven by an observation that the variance of coefficients across wavelet scales often differs by 

'"Modified-CS is available at|http://home.engineering.iastate.edu/~luwei/modcs/index.html| Basis Pursuit was solved using the 
^i-MAGIC equality-constrained primal-dual solver (chosen since it is used as a subroutine within Modified-CS), available at 
|http:/7users.ece.gatech.edu/~justiri7limagic/| 
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Algorithm 


TNMSE (dB) 


Runtime 


Basis Pursuit 


-17.22 


47 min 


Modified-CS 


-34.30 


7.39 hrs 


DCS-AMP (Filter) 


-34.62 


8.08 sec 



TABLE IV: Performance on dynamic MRI dataset from 1511 with increased sampling rate at initial timestep. 



Algorithm 


TNMSE (dB) 


Runtime 


Basis Pursuit 


-16.83 


47.61 min 


Modified-CS 


-17.18 


7.78 hrs 


DCS-AMP (Filter) 


-29.51 


7.27 sec 



TABLE V: Performance on dynamic MRI dataset from 1511 with identical sampling rate at every timestep. 

an order-of-magnitude. The noise variance is arguably the most important parameter to initialize properly, 
since it balances the conflicting objectives of fitting the data and adhering to the assumed signal model. 
Our rule-of-thumb for initializing this parameter was that it is best to err on the side of fitting the data 
(since the SNR in this MRI data collection was high), and thus we initialized the noise variance with a 
small value. 

In Table [TVl we summarize the performance of three different estimators: timestep-independent Basis 
Pursuit, which performs independent £i minimizations at each timestep, Modified-CS, and DCS -AMP 
(operating in filtering mode). In this experiment, per the setup described in lISTl . the initial timestep was 
sampled at 50% of the Nyquist rate, i.e., M = N/2, while subsequent timesteps were sampled at 16% of 
the Nyquist rate. Both Modified-CS and DCS-AMP substantially outperform Basis Pursuit with respect 
to TNMSE, with DCS-AMP showing a slight advantage over Modified-CS. Despite the similar TNMSE 
performance, note that DCS-AMP runs in seconds, whereas Modified-CS takes multiple hours. In Fig. |5j 
we plot the true images along with the recoveries for this experiment, which show severe degradation 
for Basis Pursuit on all but the initial timestep. 

In practice, it may not be possible to acquire an increased number of samples at the initial timestep. 
We therefore repeated the experiment while sampling at 16% of the Nyquist rate at every timestep. 
The results, shown in Table jV] show that DCS-AMP's performance degrades by about 5 dB, while 
Modified-CS suffers a 14 dB reduction, illustrating that, when the estimate of the initial support is poor, 
Modified-CS struggles to outperform Basis Pursuit. 
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Fig. 5: Frames 1, 2, 5, and 10 of the dynamic MRI image sequence of (from top to bottom): the fully sampled dataset, Basis 
Pursuit, Modified-CS, and DCS-AMP, with increased sampling rate at initial timestep. 

D. Recovery of a CS audio sequence 

In another experiment using real-worid data, we used DCS -AMP to recover an audio signal from 
sub-Nyquist samples. In this case, we employ the BemouUi-Gaussian-mixture signal model proposed 
for DCS-AMP in Section |Vl The audio clip is a 7 second recording of a trumpet solo, and contains a 
succession of rapid changes in the trumpet's pitch. Such a recording presents a challenge for CS methods, 
since the signal will be only compressible, and not sparse. The clip, sampled at a rate of 11 kHz, was 
divided into T = 54 non-overlapping segments of length N = 1500. Using the discrete cosine transform 
(DCT) as a sparsifying basis, linear measurements were obtained using a time-invariant i.i.d. Gaussian 
sensing matrix. 

In Fig. [6] we plot the magnitude of the DCT coefficients of the audio signal on a dB scale. Beyond the 
temporal correlation evident in the plot, it is also interesting to observe that there is a non-trivial amount 
of frequency correlation (correlation across the index [n]), as well as a large dynamic range. We performed 
recoveries using four techniques: BG-AMP, GM-AMP (a temporally agnostic Bernoulli-Gaussian-mixture 
AMP algorithm with Z) = 4 Gaussian mixture components), DCS-(BG)-AMP, and DCS-GM-AMP (the 
BemouUi-Gaussian-mixture dynamic CS model described in Section |Vl with D = 4). For each algorithm, 
EM learning of the model parameters was performed using straightforward variations of the procedure 
described in Section |IVl with model parameters initialized automatically using simple heuristics described 
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Magnitude (in dB) of DCT Coefficients of Audio Signal 
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Fig. 6: DCT coefficient magnitudes (in dB) of an audio signal. 







Undersampling Rate 










o 


Algorithm 


BG-AMP 


-16.88 (dB) 1 09.11 (s) 


-11.67 (dB) 1 08.27 (s) 


-08.56 (dB) 1 06.63 (s) 


GM-AMP (D = 4) 


-17.49 (dB) i 19.36 (s) 


-13.74 (dB) 1 17.48 (s) 


-10.23 (dB) 1 15.98 (s) 


DCS-BG-AMP 


-19.84 (dB) 1 10.20 (s) 


-14.33 (dB) 1 08.39 (s) 


-11.40 (dB) 1 6.71 (s) 


DCS-GM-AMP (D = 4) 


-21.33 (dB) 1 20.34 (s) 


-16.78 (dB) 1 18.63 (s) 


-12.49 (dB) 1 10.13 (s) 



TABLE VI: Performance on audio CS dataset (TNMSE (dB) | Runtime (s)) of two temporally independent algoritlims, 
BG-AMP and GM-AMP, and two temporally structured algorithms, DCS-BG-AMP and DCS-GM-AMP. 

in ll33l . Moreover, unique model parameters were learned at each timestep (with the exception of support 
transition probabilities). Furthermore, since our model of hidden amplitude evolutions was poorly matched 
to this audio signal, we fixed a = 1. 

In Table |Vl] we present the results of applying each algorithm to the audio dataset for three different 
undersampling rates, 6. For each algorithm, both the TNMSE in dB and the runtime in seconds are pro- 
vided. Overall, we see that performance improves at each undersampling rate as the signal model becomes 
more expressive. While GM-AMP outperforms BG-AMP at all undersampling rates, it is surpassed by 
DCS-BG-AMP and DCS-GM-AMP with DCS-GM-AMP offering the best TNMSE performance. Indeed, 
we observe that one can obtain comparable, or even better, performance with an undersampling rate 5 = | 
using DCS-BG-AMP or DCS-GM-AMP, with that obtained using BG-AMP with an undersampling rate 
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E. Frequency Estimation 

In a final experiment, we compared the performance of DCS -AMP against techniques designed to 
solve the problem of subspace identification and tracking from partial observations (SITPO) ll52l . ll53l . 
which bears similarities to the dynamic CS problem. In subspace identification, the goal is to learn the 
low-dimensional subspace occupied by multi-timestep data measured in a high ambient dimension, while 
in subspace tracking, the goal is to track that subspace as it evolves over time. In the partial observation 
setting, the high-dimensional observations are sub-sampled using a mask that varies with time. The 
dynamic CS problem can be viewed as a special case of SITPO, wherein the time-t subspace is spanned 
by a subset of the columns of an a priori known matrix A^^\ One problem that lies in the intersection 
of SITPO and dynamic CS is frequency tracking from partial time-domain observations. 

For comparison purposes, we replicated the "direction of arrival analysis" experiment described in [53] 
where the observations at time t take the form 

= $WvWaW+eW, t = l,2,...,r (13) 

where G {0, l}*^^^ is a selection matrix with non-zero column indices Q^*) C {1, . . . , N}, V^*^ G 
£^NxK ^ Vandermonde matrix of sampled complex sinusoids, i.e., 

FW^[^(a;f ),..., ^(4))], (14) 
with v{u:f) = [1, e-^'^'^'^i" , . . . , ^i^^'^i'^ {N -i)^ Jt) ^ ^-.^^ ^{t) (z^K ^^^^^^ instantaneous 
amplitudes, and e^*) € ffi^ is additive noise with i.i.d. A/'(0,cjg) elements^ Here, {$^*^}^^ is known, 
while {u^*^}J^^ and {a^*)}^]^ are unknown, and our goal is to estimate them. To assess performance, 
we report TNMSE in the estimation of the "complete" signal {^'■^•'a^*)}^]^. 

We compared DCS-AMP's performance against two online algorithms designed to solve the SITPO 
problem: GROUSE JSa and PETRELS fSl. Both GROUSE and PETRELS return time-varying subspace 
estimates, which were passed to an ESPRIT algorithm to generate time-varying frequency estimates (as 
in ll53l ). Finally, time-varying amplitude estimates were computed using least-squares. For DCS-AMP, 
we constructed A^*^ using a 2x column-oversampled DFT matrix, keeping only those rows indexed by 
QW. DCS-AMP was run in filtering mode for fair comparison with the "online" operation of GROUSE 
and PETRELS, with / = 7 inner AMP iterations. 

"Code for replicating the experiment provided by the authors of 1531 . Unless otherwise noted, specific choices regarding 
{4*'} and {a(*'} were made by the authors of [53 1 in a deterministic fashion, and can be found in the code. 
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Problem Setup 






N = 256, M = 30, K = 5 


iV = 256, M = 10, K = 5 


iV = 1024, M = 120, K = 20 


s 


GROUSE 


-4.52 (dB) 1 6.78 (s) 


2.02 (dB) 1 6.68 (s) 


-4.51 (dB) 1 173.89 (s) 


Algorit 


PETRELS 


-15.62 (dB) 1 29.51 (s) 


0.50 (dB) 1 14.93 (s) 


-7.98 (dB) 1 381.10 (s) 


DCS-AMP 


-15.46 (dB) 1 34.49 (s) 


-10.85 (dB) 1 28.42 (s) 


-12.79 (dB) 1 138.07 (s) 



TABLE VIL Average perfoimance on synthetic frequency estimation experiment (TNMSE (dB) | Runtime (s)) of GROUSE, 

PETRELS, and DCS-AMP In all cases, T = 4000, = 10"^ 

The results of performing the experiment for three different problem configurations are presented in 
Table IVlIl with performance averaged over 100 independent realizations. All three algorithms were given 
the true value of K. In the first problem setup considered, we see that GROUSE operates the fastest, 
although its TNMSE performance is noticeably inferior to that of both PETRELS and DCS -AMP, which 
provide similar TNMSE performance and complexity. In the second problem setup, we reduce the number 
of measurements, M, from 30 to 10, leaving all other settings fixed. In this regime, both GROUSE and 
PETRELS are unable to accurately estimate {w^*^}, and consequently fail to accurately recover V''*^a(*\ 
in contrast to DCS-AMP. In the third problem setup, we increased the problem dimensions from the first 
problem setup by a factor of 4 to understand how the complexity of each approach scales with problem 
size. In order to increase the number of "active" frequencies from A' = 5 to = 20, 15 additional 
frequencies and amplitudes were added uniformly at random to the 5 deterministic trajectories of the 
preceding experiments. Interestingly, DCS-AMP, which was the slowest at smaller problem dimensions, 
becomes the fastest (and most accurate) in the higher-dimensional setting, scaling much better than either 
GROUSE or PETRELS. 

VII. Conclusion 

In this work we proposed DCS -AMP, a novel approach to dynamic CS. Our technique merges ideas 
from the fields of belief propagation and switched linear dynamical systems, together with a computa- 
tionally efficient inference method known as AMP Moreover, we proposed an EM approach that learns 
all model parameters automatically from the data. In numerical experiments on synthetic data, DCS-AMP 
performed within 3 dB of the support-aware Kalman smoother bound across the sparsity-undersampling 
plane. Repeating the dynamic MRI experiment from [Si], DCS -AMP slightly outperformed Modified-CS 
in MSE, but required less than 10 seconds to run, in comparison to more than 7 hours for Modified- 
CS. For the compressive sensing of audio, we demonstrated significant gains from the exploitation of 
temporal structure and Gaussian-mixture learning of the signal prior. Lastly, we found that DCS-AMP can 
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Fig. 7: The factor graph representation of the decomposition of ( I16t . 

outperfomi recent approaches to Subspace Identification and Tracking from Partial Observations (SITPO) 
when the underlying problem can be well-represented through a dynamic CS model. 

Appendix A 
The Basics of Belief Propagation and AMP 

In this appendix, we provide a brief primer on belief propagation and the Bayesian approximate 
message passing (AMP) algorithmic framework proposed by Donoho, Maleki, and Montanari Bill . In 
what follows, we consider the task of estimating a signal vector x € from linearly compressed and 
AWGN-corrupted measurements: 

y = Ax + ee£.^. (15) 



AMP can be derived from the perspective of loopy belief propagation (LBP) |23], a Bayesian inference 
strategy that is based on a factorization of the signal posterior pdf, p{x\y), into a product of simpler 
pdfs that, together, reveal the probabilistic structure in the problem. Concretely, if the signal coefficients, 
x, and noise samples, w, in ([TSl l are jointly independent such that p{x) = Y\n=iPi^ri) and p{y\x) = 
Y\m=i C^iym] dJu^i '^e)' t^cn the posterior pdf factors as 

M N 

)Y[P^^n), (16) 

m=l n=l 

yielding the factor graph in Fig. |7] 

In belief propagation ll24l . messages representing beliefs about the unknown variables are exchanged 
amongst the nodes of the factor graph until convergence to a stable fixed point occurs. The set of beliefs 
passed into a given variable node are then used to infer statistical properties of the associated random 
variable, e.g., the posterior mode, or a complete posterior distribution. The sum-product algorithm [23] 
is perhaps the most well-known approach to belief propagation, wherein the messages take the form of 
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probability distributions, and exact posteriors are guaranteed whenever the graph does not have cycles 
("loops"). For graphs with cycles, exact inference is known to be NP-hard, and so LBP is not guaranteed to 
produce correct posteriors. Still, it has shown state-of-the-art performance on a wide array of challenging 
inference problems, as noted in Section IIII-B I 

The conventional wisdom surrounding LBP says that accurate inference is possible only when the factor 
graph is locally tree-like, i.e., the girth of any cycle is relatively large. With ([TSl l. this would require that A 
is an appropriately constructed sparse matrix, which precludes some of the most interesting CS problems. 
In a remarkable departure from convention, Donoho, Maleki, Montanari, and Bayati demonstrated that 
LBP-based compressive sensing is not only feasible lITTl . ||3T1 for dense A matrices, but provably 
accurate |[T9l . In particular, they established that, in the large-system limit (i.e., as M,N — > oo with 
M/N fixed) and under i.i.d. sub-Gaussian A, the iterations of AMP are governed by a state-evolution 
whose fixed point — when unique — yields the true posterior means. Beyond its theoretical significance, 
AMP is important for its computational properties as well. As demonstrated in the original AMP work 
IfTTl . not only can LBP solve the compressive sensing problem (fTSl ). but it can do so much faster, and 
more accurately, than other state-of-the-art methods, whether optimization-based, greedy, or Bayesian. 
To accomplish this feat, ifTTl . lISTl proposed a specific set of approximations that become accurate in 
the limit of large, dense A matrices, yielding algorithms that give accurate results using only 2MN 
fiops-per-iteration, and relatively few iterations (e.g., tens). 

The specific implementation of any AMP algorithm will depend on the particular choices of likelihood 
and prior, but ultimately amounts to an iterative, scalar soft-thresholding procedure with a carefully 
chosen adaptive thresholding strategy. Deriving the appropriate thresholding functions for a particular 
signal model can be accomplished by computing scalar sum-product, or max-sum, updates of a simple 
form (see, e.g., ||36l Table 1]). 
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